Thermodynamic approach for community discovering within the complex networks: 

LiveJournal study. 
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The thermodynamic approach of concentration mapping is used to discover communities in the 
directional friendship network of LiveJournal users. We show that this Internet-based social network 
has a power-law region in degree distribution with exponent 7 = 3.45. It is also a small-world 
network with high clustering of nodes. To study the community structure we simulate diffusion of a 
virtual substance immersed in such a network as in a multi-dimensional porous system. By analyzing 
concentration profiles at intermediate stage of the diffusion process the well-interconnected cliques 
of users can be identified as nodes with equal values of concentration. 
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I. INTRODUCTION 

In recent years there has been an enormous break- 
through in research of complex networks due to the appli- 
cation of statistical physics methodology Q, |^ |1] . Many 
different complex systems instead of being completely 
random prove to have signatures of organization such as 
clustering and power-law distribution of links. Together 
with the small-world property these are the inherent 
features of an extremely wide variety of systems such as 
the World-Wide Web [1, H 0L Internet 0, collabora- 
tion networks of movie actorsj^, lioll and scientists ^3 j 
the web of human sexual contacts jlj and many others. 
In spite of the fact that some concepts of complex net- 
works theory were originally introduced in sociology the 
statistical study of social networks is complicated by the 
difficulty in reliable data collection due to certain privacy 
and ethical reasons. One of the solutions for this problem 
is the analysis of collaboration networks 191 pl l , e-mail in- 
teractions [1^ [T3I , instant messaging |l5l| and online 
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FIG. 1: Probability density functions of in- and out-degrees 
for LiveJournal users. The line shows a slope of -3.45 which 
equally well fits P{ki„) and P{kout)- 



blogging 0, 0, 0, 01 . Here we studied basic struc- 
tural properties of LiveJournal blog service social net- 
work and demonstrated the diffusion-motivated method 
to discover communities on the case of this network. 



II. LIVEJOURNAL NETWORK 

LiveJournal (LJ) is an online web-based journal ser- 
vice with an emphasis on users interactions |20j . In Jan- 
uary 2006 it had 9.3 • 10^ users in total, 2.0 • 10*^ of them 
were active in some way according to official LiveJour- 
nal statistics 0] . The essential feature of L J service is 
the "friends" concept which helps users to organize their 
reading preferences and provides security regulations for 
their journal entries and personal data. Friends list is 
an open information and can be accessed through a con- 
ventional WWW interface or through a dedicated bot 
interface provided by LJ system. 

Data collection was performed by crawler programs 
running simultaneously on two computers and explor- 
ing the LJ space by following directional friendship links 
starting from two users with a large number of incoming 
friendship links. For each user the crawler was obtaining 
his friends list (outgoing links) and the number of users 
who have the given user in their friends list (incoming 
links). Each user from the friends list which was not yet 
explored by the crawler was added to the end of the pro- 
cessing queue if he was not already there. If the user was 
in the queue his queue score was incremented every time 
he was found in someones' friends list. Users with higher 
queue scores were processed first. This ensured fast col- 
lection of the essential part of the network. Basically 
this algorithm is a modification of Tarjan's depth-first 
search algorithm for finding the connected component of 
a graph |22L [2^ . Total time of collection was 14 days 
with the total number of discovered users 3 746 264 found 
in a connected component. We are aware that during the 
time of collection the network was undergoing continuous 
changes. We estimated the number of users deleted from 
the LJ database but still present in the friends lists was 
less than 0.1% which makes us believe that the evolution 
of LJ network did not influence our statistics much. 
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The estimated probability distribution functions of 
in- and out-degree are presented in log-log scale in the 
Fig. The estimated mean of the numbers of outgo- 
ing and incoming friendship links is (kout) = 15.91 and 
{kin) = 16.07, correspondingly. The average in-to-out 
ratio {kin/ kout) = 1.157. The number of incoming links 
is slightly larger than the number of outgoing due to the 
fact that only the outgoing links were used for crawler 
navigation so some of the LJ users were unreachable by 
directional links but they were listed in the users pages. 

There are also several technical restrictions for the de- 
grees: maximum number of friends per user is 750 and 
only 150 of them can be listed on the users' info page 
and can be effortlessly accessed by the LJ users. From 
our experience LJ bots interface does have some prob- 
lems listing the users who consider a certain user as a 
friend if there are more than 2500 of them hence we cut 
the data at kout max = 2500. 

As one can see from the Fig-^in- and out-degree distri- 
butions reveal a power-law decay P{k) ^ k^^ for k > 100 
with the value of the exponent 7j„ « ^out = 3.45 ± 0.05 
which is surprisingly close to the values 7i„ ss QWt ~ 3.4 
obtained by Liljeros et al. for sexual contacts [ll|. Scal- 
ing of the distributions contradicts the results of Nowell 
et al. J,9] who reported parabolic shape of LJ degrees 
distributions. The skewness of the distributions in our 
case can be explained by the social origin of LJ network. 
As it is pointed out by Jin et al. f^l degree distribution 
for social networks does not appear to follow power-law 
distribution due to the cost in terms of time and efforts 
to support friendship. In the case of LJ network the cost 
of friendship is the size of friends feed which accumulates 
all the recent entries of the user's friends. We can also 
separate two classes of LJ users: "readers" and "writ- 
ers" . The first are mainly using their accounts to read 
the journals of others. They update journals only episod- 
ically and are not deeply involved in LJ community life. 
They do not have many incoming and outgoing links and 
they are responsible for skewness of the distributions for 
k < 100. Meanwhile active "writers" , representing mi- 
nority of the registered users exploit full capacity of LJ 
system. They spend much time participating in LJ com- 
munity life, and they have a larger number of incoming 
and outgoing links which are distributed by power-law. 

The origin of power-law region in the distributions 
can be explained by continuous evaluation and self- 
organization of the LJ network and preferential attach- 
ment mechanism similar to the general WWW growth 
mechanism ^25|. One an interesting journal gets popular 
it will be cited and promoted in the journals of its readers 
which will help to further increase its popularity which 
leads to a "ricl^get-richer" effect occurring in many net- 
work systems 0, |25j . However linear growth with lin- 
ear preferential attachment protocol leads to a power- 
law degree distribution with 7 = 3 which is smaller than 
the exponent obtained for our study. Larger values of 
exponent can be explained by alternative growth mech- 
anisms: preferential attachment with rewiring (26| and 



copying mechanism Rewiring in LJ system implies 

that users are not only establishing new friendship links 
but also breaking the old ones while copying occurs when 
the user inherits part of the friendship connections of his 
friends. The latter effect is called "transitivity" in soci- 
ology and is responsible for users cliques formation 
or clustering. 

We characterize clustering of LJ users by calculating 
the clustering coefRcient as introduced by Watts and 
Strogatz P, • It is defined as the number of links be- 
tween user's friends divided by the maximum possible 
number of links between them averaged over all users in 
the network. If the user i has ki friends with Ei links 
between them the maximum possible number of directed 
links is ki{ki — 1) and the clustering coefficient for the 
user i in the case of directed network can be defined as: 

ki{ki 1) 

The average clustering coefRcient for the whole network 
as calculated from our data is: C = {Ci)i=i,,N ~ 0.3302. 
It is worth to compare this value to the clustering coef- 
ficient of a random directional Erdos-Renyi graph which 
can be found as Crand = {k)/{N — 1) which for LJ net- 
work is ca. 4.24 • 10^^. The fact that actual clustering 
coefficient for LJ network is nearly five orders of mag- 
nitude larger than it would be expected from randomly 
linked network with the same degree and size is a clear 
indication of high user clustering. 

The peculiar feature of the L J network is the high reci- 
procity "27^ of friendship hnks. We found that 79.26% of 
links are bi-directional which means that this percentage 
of outgoing links is returned as incoming and vice versa 
the same percentage of incoming links originates from 
users friends. This value is higher than reciprocity 57% 
found for the WWW which is the technical environ- 
ment of LJ. Increasing of reciprocity may be explained 
by social origin of LJ network. Due to the rules of social 
interactions user A usually feels obliged to establish a 
friendship connection to the user B if such a connection 
was already established by B to A. Another explanation 
for high reciprocity is that often relations in LJ space is 
based on real-life people relations which means that LJ 
users are linking to the other users which are their friends 
in the real world. In this case the LJ network directly in- 
herits the undirectional structure of the underlying social 
network. 

In order to characterize small-world properties of LJ 
network we estimated the probability distribution func- 
tion Pe{i) of the minimum path distance or hopcount 
between the nodes through directional links. The results 
are presented in the Fig. [3 The average distance esti- 
mated for our set of data is {£) = 5.86. Based on the 
recently obtained expression for the mean distance be- 
tween the nodes in scale-free networks by Hooghiemstra 
et al. |29j | who improved the widely used result of New- 
man et al. 01 the value of {£) can be estimated as the 
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FIG. 2: Probability distribution function of the minimum 
path length between LiveJournal users through the directional 
friends links. 

following: 

W..^^ + ^-f ^- + ^"V^"^"'^V 2-^, (2) 
Ini' 2 \ inv J logv 

where N is the size of the network, /i — (k), v = 
{k{k - l))/(fc}, 7e « 0.577 is the Euler-Mascheroni con- 
stant, and e is the expectation of the logarithm of the 
limit of a super-critical branching process which depends 
on the scaling exponent 7 and belongs to the half-open 
interval (—1,0], where the lower boundary is the numer- 
ical extrapolation of the results from ,29] and the upper 
boundary the theoretical prediction for 7 > 3. 

For LJ data the equation 10) gives the following range 
of the mean distance: 4.53 < {i)th < 5.05 which is in any 
case smaller than statistically obtained value. This theo- 
retical prediction assumes the homogeneity of the graph, 
and we believe the possible reason for such an under- 
estimation of the mean path length is the macroscopic 
structuring of the network which is discussed further. 

III. COMMUNITY DISCOVERING METHOD 

It seems to be quite natural for the nodes of the com- 
plex networks to aggregate into macroscopic structures 
with high internal links density and weak connection to 
the rest of the network. Such groups are often referred to 
as communities. Particular reasons for communities for- 
mation may depend on the type of the network but this 
feature proved to be quite universal and can be found in 
social, biological and computer networks [s^, |^ ■ Find- 
ing these structures within the network is the major step 
towards understanding its topology. 

This problem is known as a graph-partitioning problem 
in graph theory and has a nondeterministic polynomial 
(NP) complexity which makes it almost inapplicable for 
large networks. 

Recent advances in the study of complex networks 
stimulated the search of alternative techniques for com- 
munity discoverin g an d many orig inal solutions were pro- 
posed [11 il 111 111 111 111 111 113. These algorithms 
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FIG. 3: Illustration of the community detection algorithm. 
After diffusion process starts from the initiator node virtual 
ink propagates through network links. Communities can be 
recognized as the groups of nodes with similar amount of ink. 



can be divided into two main classes: divisible, which 
hierarchically split the network by removing edges with 
the highest betweenness [SO, .33] and agglomerative which 
start from the maximal community division when each 
node belongs to its own separate community and con- 
tinuously merge these communities basing on some pa- 
rameter of nodes similarity |35l or optimizing the 
partitioning. In their recent work Clauset et al. |3^ 
used the greedy optimization in order to maximize the 
modularity measure of partitioning quality |3ll l33| . Cur- 
rently this method is one of the fastest and runs in time 
0{MHlnN), where M = {k)N is the number of edges 
in the network and H is the number of decomposition 
levels which is usually small [H = 0{lnN)) 35]. In a 
sparse network the degree is limited and M = 0{N) and 
so the complexity is 0{Nlii?N) which makes it fastest 
nowadays. 

Here we propose a method to find communities based 
oh the principles of thermodynamics. When the system 
gets large enough so that the behavior of its microscopic 
constituents can be successfully averaged to give basis for 
a scientific descriptions of phenomena with avoidance of 
microscopic details. Since in thermodynamics behavior of 
the system can be described without solving the equation 
of motion of every constituent molecule we believe that 
structure of the large complex network can be explored 
without explicit solution of graph partitioning problem. 

Our current study is based on the simulation of a mass 
diffusion process in the complex network as in a multi- 
dimensional porous system with directional links follow- 
ing physical laws. The diffusion process initiated at one 
of the nodes by addition of the virtual ink produces a 
non-uniform mass distribution at the intermediate state 
which can be used to reveal well-interconnected com- 
munities within the complex network by selecting the 
nodes with similar concentration values. In this sense our 
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method falls in the class of agglomerative techniques with 
the concentration as the similarity measure. However, it 
can be shown that the quantity tab = \ln(j)A — ln(l)B\, 
where 4>A and (/jb are two values of concentration in the 
nodes A and B, as the measure of distance between these 
nodes. Thus edge betweenness, characterized as the drop 
of the logarithm of concentration along the edge, can be 
used for hierarchical decomposition of the network. 

The similar measure of distance between nodes based 
on the random walk has been recently introduced by Pons 
et al. for the class of undirected networks. It is 

defined as the difference in probabilities for a random 
walker to reach nodes the A and B in certain number of 
steps t starting from some node Z. As these probabilities 
for a large t are mainly determined by the in-degrees of 
the nodes the values of distance should be normalized A 
short number of steps t may depend on a particular net- 
work and should be known in advance. Pons et al. also 
pointed out conceptual difhculties of the random walk 
scheme application for the directed networks js^. Sev- 
eral other diffusion motivated approaches proposed re- 
cently {e.g. 113, nil) are more or less consistent with 
random-walk analogy. 

In our model we break the similarity with classical 
random walks and the theory of flows in the graph 
in favour of a realistic physical picture. First, we allow 
nodes to accumulate substance by assigning to them in- 
finite maximum capacity. The direct flow from the node 
A to the node B is possible if there is a directed link from 
AtoB and 0a > 'Pb- The flow rate in this case depends 
on the concentration difference — 0a > and the out 
degree kout of the node A. In the case of ^4 < B no mass 
is delivered directly from ^ to i3. Such rules in the limit 
of infinite time lead to equilibrium state with equal mass 
distribution which meets the physical expectations. 

Network links in our realization represent pipes 
(Fig. |3Jl , directed links act as pipes allowing mass to pass 
in one direction. Mass propagation within the network 
system is driven by Flick's law of diffusion: 

dM = -D^dSdt, (3) 
ox 

where dM is mass change, S(j)/Sx is concentration gradi- 
ent and dS is an area element. 

For our discrete system this implies that the rate of 
mass exchange between the neighbouring nodes is pro- 
portional to the difference of masses in these nodes. Ev- 
ery node uses its outgoing links to deliver mass to its 
neighbors with a smaller amount of ink. The amount of 
ink AotjjMj delivered by the node to its ith neighbour is: 

AoutM, = ~^{Mo - M,), (4) 

where Mg > Mi and a is the coefficient determining the 
transfer rate and is constant for all nodes. We analyze 
the mass M contained in the node instead of the concen- 
tration (j) assuming that all nodes have the same geomet- 
rical volume. The total delivered mass for a node is the 



following: 

AoutM = J2 AoutM, = ~a Mo -t—^mA = 

z=l V t=l J 

-a{Mo-M), (5) 

where M is the mean ink mass in the neighbouring nodes 
with smaller masses. Mass transfer in the pipe happens 
instantaneously. Thus we can apply mass conservation 
law and increase mass in the neighbouring nodes by the 
amount taken from the node: 

kout 

AoutM = -^A,„M, (6) 

i=l 

A,nM = -J^AoutM, (7) 

1=1 

The total change of mass at a certain node is composed 
of the loss of mass due to diffusion to the neighbours 
through outgoing links and gain of mass by the amount 
delivered from neighbors through incoming links: AM — 
AinM+AoutM . This conservation law is the extension of 
Kirchhoff's law 39] for the node with non-zero capacity. 

In order to prevent inequality due to sequential nodes 
processing, mass changes for all nodes were calculated 
without actual changing the masses and then values of 
the masses in all nodes were updated. For the special 
case of absence of outgoing links AoutM — the specific 
node acts as a virtual ink absorber which can only gain 
ink from the neighbours but does not have ways to deliver 
it back. Nodes without incoming links are not considered 
due to their invisibility for the data collecting crawler and 
thus are absent in our database. 

We start by putting an initial amount of ink of Mq — N 
mass units in one of the nodes which we call the initiator. 
Subsequently system is allowed to proceed to the equilib- 
rium state by continuous mass redistribution within the 
network according to our rules. The expectation for an 
equilibrium state for a connected network system is equal 
distribution of mass Mq among the nodes so that each 
of them ends up having Mg/N = 1 mass units. While 
evolving to this state the system passes through non- 
equilibrium states with non-uniform mass distributions. 

Imagine a cluster of well connected nodes inside the 
network connected to the outside world only by few out- 
going and incoming links. The ink diffusion inside the 
cluster is relatively fast due to the presence of a large 
number of exchange channels between the members and 
a high conductivity of the channels ensemble. Limited 
number of channels going outside the cluster forms the 
bottleneck for mass delivery. Under these conditions the 
flow rate between the members is much higher than be- 
tween the members and non-members and dispersed ink 
will likely form an equi-concentrational volume within 
the cluster. Each cluster in this system with specific con- 
nection properties such as flow rate and distance from the 
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FIG. 4: Dynamics of relative concentration change in the ini- 
tiator node doctorjivsy for different flow rates a. Inset shows 
rescaled data. Oscillatory parts were cut away. 

initiator would have in each of its nodes the same con- 
centration of ink with the value specific to the particular 
cluster. Thus by estimating the probability distribution 
function of concentration one can analyze non-uniformity 
of ink distribution and reveal separated clusters by de- 
termining the signatures of equi-concentration volumes. 

The flow rate a from the equation ^ can be selected 
from the half- interval (0;1] and defines the speed of sim- 
ulation. Values larger than 0.5 are not desirable because 
they can cause concentration waves or back-refiections in 
some cases. 

The proposed method does not aim to decompose the 
whole network on minimal clusters but to reveal signifi- 
cant clusters within the network. As we regard the net- 
work as an open system which does not have to be fully 
described by existing database we do not assign measure 
of clustering of the whole network like modularity pro- 
posed by Newman js^, . However we can quantify the 
isolation of the individual community i by parameter of 
confinement Ki which is the characterization of assorta- 
tive mixing of individual community. We can define Ki 
using notation of Newman 35| as following: 

= = (8) 

where Cij is the fraction of network edges connecting 
nodes of the community i to the community j and 
^'-i ~ ^* ^® fraction of edges starting from the 
members of i. Thus parameter Ki defines the number of 
links connecting the nodes inside the community i as a 
fraction of the total number of links originating from the 
members of i. 
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FIG. 5: Probability distribution functions of virtual ink con- 
centration M at two stages of the diffusion process with 
a — 0.1 and doctor Jivsy as the initiator node. Inset rep- 
resents the same data in linear scale. Two well pronounced 
peaks of two separated communities are clearly seen. 
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FIG. 6: Dynamics of virtual ink distribution within LJ net- 
work as a logarithmically color coded probability distribution 
function of the ink concentration (vertical axis) and simula- 
tion step (horizontal axis). Separation of Russian-speaking 
community (thin upper line, high concentration values) from 
general English-speaking (thicker lower line, lower concentra- 
tion values) can be clearly seen. 



IV. RESULTS AND DISCUSSION 

To test our method we performed ink diffusion sim- 
ulations using our LJ database starting from different 
initiator nodes. Fig. 0] shows the relative mass decay as 
a function of simulation step number T for the fiow rates 
a — 0.1, 0.25 and 0.5. User doctorjivsy with a high num- 
ber of incoming links was chosen as the initiator node. As 
we will show later this user belongs to extremely confined 
Russian-speaking community. The inset of Fig. ^ shows 
the same data rescaled with respect to a. As one can see 
from the match of rescaled curves the dynamics of the 
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process does not depend on the flow rate a in this range. 
The striking feature of the presented data is the obvious 
step-hke form of the curves which is the effect of non- 
homogeneous structure of the LJ network. Flat parts of 
the AM/M curves correspond to the exponential decays 
of M which is the sign of non-restricted diffusion of ink. 
The first significant drop of the decay rate happens when 
Ta « 5 which is equal to the double radius of the com- 
munity to which our initiator belongs. This corresponds 
to the moment when virtual ink fills the whole commu- 
nity and further expansion of filled area is impeded by 
the limited number of links going outside the community. 
So if it takes Tq simulation steps for the virtual ink to 
reach the borders of the community it also takes Tq sim- 
ulation steps for the decay of concentration gradient to 
reach the initiator node and together this gives double 
size of the community. The second drop at Ta « 22 is 
not well pronounced and corresponds to the filling of the 
whole network. 

As our community discovering algorithm is based on 
the detection of equi-concentration volumes we per- 
formed the calculation of the probability distribution 
function of M at two stages of virtual ink diffusion for 
a = 0.1 (FiglSJ. One can see two well pronounced peaks 
on all plots which occurred to be the Russian speaking 
community (larger values of mass M) and the rest of LJ 
network (broader peak at smaller values of M). 

The dynamics of virtual ink distribution is presented 
in the FigO As it can be seen a distinct separation of the 
Russian community peak from the main peak is formed 
before step Ta = 50. At the latter stage it is quite sta- 
ble and easily distinguishable up to iteration Ta = 10'^ 
which gives quite a long quasi-stationary stage that can 
be used for communities detection. It also demonstrates 
that the process of equi-concentrational volumes forma- 
tion is much faster than the relaxation of the whole sys- 
tem. 

If the initiator node is selected somewhere outside the 
community the splitting of the distribution peak is also 
observed but for this case average concentration within 
the Russian community is smaller compared to the rest 
of the LJ nodes. This supports the expectations that if 
the community has a limited number of outgoing links it 
also lacks incoming links. 

The accuracy of community discovering scheme can 
be improved by simultaneous simulation of the diffusion 
from two or more initiator nodes. Here we assigned 
two independent concentration values to a single node. 
All diffusion processes proceed without inffuencing each 
other. The LJ network can now be mapped as a probabil- 
ity distribution function of two concentrations and thus 
the community can be localized on a two dimensional plot 
as shown in the Fig. [T] for doctorjivsy and futurejvisions 
as the initiator nodes. One can see two main separated 
peaks corresponding to the major part of LJ network 
and the Russian-speaking community. The abundance 
of noise-like spots on the map corresponds to the small 
well-separated and well linked communities existing in 
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FIG. 7: Two-dimensional map of LJ users network obtained 
by concentration configurations of independent diffusion pro- 
cesses from two initiator nodes on the stage Ta — 100. 



the network which are well localized. 

The selection of nodes from a certain community can 
be performed by simple thresholding the values of both 
concentrations. The group of nodes with the concen- 
tration values within the selected range which form the 
connected component in the network can be identified as 
the community. The ratio of the number of connected 
nodes to the total number of users with concentrations 
within the range defines the specificity of the method. 

As the complete analysis of LJ community structure 
as well as the reasons of their formation is out of the 
scope of the current paper we will not list all user cliques 
found. However in the Tab. HI we list the largest LJ com- 
munity and two smaller ones together with their parame- 
ters. The size of discovered Russian-speaking community 
is of the order of the total number of LJ users from the 
Russian Federation according to LJ database statistics 
21] (232 241 users in January 2006). The obvious reason 
for the separation of this community with a very high 
value of confinement K — 98.34% is the prevailing us- 
age of Russian language. We found by separate analysis 
of info pages and journal entries that 92% of the users 
within this community are using Cyrillic alphabet. The 
fact that the Russian LJ community differs from the rest 
of LJ network has been already pointed out by Internet 
observers (e.g. Ref. 01). The two other listed com- 
munities are the examples of surprisingly popular class 
of Role-Playing Game communities formed by the vir- 
tual users playing characters and writing their journals 
on behalf of these characters. 
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TABLE I: Examples of discovered communities within Live Journal userspace. 



Representing node 


Number of users 


Specificity 


Confinement K 


Comments 


doctorjivsy 


227314 


99.89% 


98.34% 


Russian speaking community" 


future_visions 


421 


98.36% 


96.22% 


Fandom High Role-Playing Game community 


alected 


262 


99.21% 


99.10% 


Loviosa Role-Playing Game community 



"92% of users have Cyrillic letters in their information pages or 
journals 



V. CONCLUSIONS 

The LiveJournal friendship network was studied with 
the general approach developed for the complex networks 
and a power-law tail with exponent 7 = 3.45 was found in 
the degree distributions. This network also demonstrated 
small-world property and high clustering. 

To study the community structure we utilized the 
original thermodynamic approach. We found that dif- 
fusion in an essentially non-euclidean geometry of a 
complex network with community structure leads to a 
peculiar phenomenon of formation of quasi-stationary 
equi-concentration volumes as shown by our simulation. 
This proves to be very useful for the detection of well- 
interconnected groups of nodes. With a limited number 
of parallel diffusion processes sufficient for a rough de- 
composition our method has an 0{NlnN) complexity 
(each simulation step analyzes M — {k)N edges which 
for a sparse matrix M = 0{N) and the required number 



of steps is proportional to the diameter of the network 
which is 0{lnN)). It is currently one of the fastest al- 
gorithms and was applied for a huge directed network of 
LJ users containing several millions of nodes. To obtain 
results presented in this paper it takes only one or two 
hours of desktop computer time. Moreover this method 
can be applied locally to a specific part of the network 
even with the lack of complete information about dis- 
tant parts of the network. The sensitivity of decomposi- 
tion can be tuned by increasing the number of initiator 
nodes with the limit of complete decomposition when ev- 
ery node acts like initiator of its own diffusion process. 
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